On Clustering Histograms with k-Means by Using Mixed α-Divergences

نویسندگان

  • Frank Nielsen
  • Richard Nock
  • Shun-ichi Amari
چکیده

Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retrieval systems, we symmetrize the α-divergences using the concept of mixed divergences. First, we present a novel extension of k-means clustering to mixed divergences. Second, we extend the k-means++ seeding to mixed α-divergences and report a guaranteed probabilistic bound. Finally, we describe a soft clustering technique for mixed α-divergences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-flat Clusteringwhith Alpha-divergences

The scope of the well-known k-means algorithm has been broadly extended with some recent results: first, the kmeans++ initialization method gives some approximation guarantees; second, the Bregman k-means algorithm generalizes the classical algorithm to the large family of Bregman divergences. The Bregman seeding framework combines approximation guarantees with Bregman divergences. We present h...

متن کامل

Reranking with Contextual Dissimilarity Measures from Representational Bregman k-Means

We present a novel reranking framework for Content Based Image Retrieval (CBIR) systems based on contextual dissimilarity measures. Our work revisit and extend the method of Perronnin et al. (Perronnin et al., 2009) which introduces a way to build contexts used in turn to design contextual dissimilarity measures for reranking. Instead of using truncated rank lists from a CBIR engine as contexts...

متن کامل

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

Mixed Bregman Clustering with Approximation Guarantees

Two recent breakthroughs have dramatically improved the scope and performance of k-means clustering: squared Euclidean seeding for the initialization step, and Bregman clustering for the iterative step. In this paper, we first unite the two frameworks by generalizing the former improvement to Bregman seeding — a biased randomized seeding technique using Bregman divergences — while generalizing ...

متن کامل

Total Jensen divergences: Definition, Properties and k-Means++ Clustering

We present a novel class of divergences induced by a smooth convex function called total Jensendivergences. Those total Jensen divergences are invariant by construction to rotations, a feature yieldingregularization of ordinary Jensen divergences by a conformal factor. We analyze the relationships be-tween this novel class of total Jensen divergences and the recently introduced tota...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Entropy

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2014